Problem Set 3
Bayesian workflow
Recall the 7 steps of our Bayesian workflow. Remember that some steps are relatively quick and straightforward (Model specification, Sampling, and Diagnostics – at least for these models), while others are more involved (Priors and Posteriors).
- Model specification
- Prior specification
- Prior predictive simulation / check
- Sampling
- Diagnostics
- Posterior predictive simulation
- Summarizing the posterior
We will work through these steps with a few different linear models. All of these models have only 1 predictor and are ones that we used in Quantitative Methods 1, either for lecture or in problem sets. So the data should be familiar even if the conclusions are not fresh in your mind.
In this problem set we will be very explicit about following the steps. This is good to practice and learn as you are gaining experience with Bayesian inference.
Difference of means with two groups
Our first model will compare the means of two groups. This is like a two-sample t-test. It is one of the most common statistical tests, also sometimes called an “independent samples” t-test.
Rather than thinking of it as a t-test, just consider it a linear model with a single predictor that is a categorical variable with two levels. We know that’s a lot more convoluted sounding, but trying to move away from “named tests” is worthwhile in the long run.
The data we will use involves the effects (or lack thereof) of predation on flat-tail horned lizards (Phrynosoma mccalli). The hypothesis is that longer horns a protective against predation by loggerhead shrikes (Lanius ludovicianus).
The parietal horn length of two groups of horned lizards were measured by Young et al. (2004)1. One group of lizards were alive, and the other group had been killed by shrikes and left in trees or bushes.
Read in the data included in the file HornedLizards.csv. Drop any rows with NA values. You should have 184 rows remaining.
# FIXME
HL <- read_csv("../Data/HornedLizards.csv", show_col_types = FALSE) |>
drop_na()
HL# A tibble: 184 × 2
horn_length group
<dbl> <chr>
1 25.2 alive
2 26.9 alive
3 26.6 alive
4 25.6 alive
5 25.7 alive
6 25.9 alive
7 27.3 alive
8 25.1 alive
9 30.3 alive
10 25.6 alive
# … with 174 more rows
Plot a histogram of the two groups, faceting rows by group so that the distributions are easier to compare.
# FIXME
ggplot(HL, aes(horn_length, fill = group)) +
geom_histogram(bins = 30) +
scale_fill_manual(values = c("tomato", "steelblue")) +
facet_grid(group ~ .) +
cowplot::theme_cowplot() +
labs(x = "Horn Length (mm)", y = "Count") +
theme(legend.position = "none")What do you observe?
There are many more alive samples than dead (counts are higher overall). The mean for dead looks like it might be lower than for the alive sample, but it’s hard to tell. The overlap in the speads is large, with the dead samples completely contained in the range of alive.
Notice that there are a few “alive” lizards with very small horns. In a regular t-test, these observations might be problematic – mathematically problematic in dragging the mean down and increasing the variance rather than scientifically problematic in being erroneous. In the parlance, these observations have a large “weight” or “leverage”.
The regularizing priors that we will use will have no problem with these observations. They will be automatically down-weighted in the analysis, because they are relatively less likely than the majority of the observations.
In the chunk below, check sample size, means, and variances for each group:
# FIXME
HL |> count(group)# A tibble: 2 × 2
group n
<chr> <int>
1 alive 154
2 dead 30
HL |>
group_by(group) |>
summarize(Mean = mean(horn_length),
Var = var(horn_length))# A tibble: 2 × 3
group Mean Var
<chr> <dbl> <dbl>
1 alive 24.3 6.92
2 dead 22.0 7.34
You will observe that there are about 5 times more alive lizards than dead. t-tests work best when sample sizes are fairly close and assume that the variances are equal between groups. Although the variances are very close in this sample, it is deceiving, because the sample sizes are so different. The variance scales with the sample size (larger samples will tend to have larger variances, just because of the equation for variance). This is in contrast to standard deviation (which is scaled to the sample size and thus comparable across samples of different size, as long as the units are the same). If you were going to use a t-test for these data, you would probably want to use Welch’s correction for unequal variances (“Welch’s correction”), which is the default in R. There are frequentist approaches that do something similar, which fall under the general term “robust regression”.
To prepare the data for passing to ulam(), we have to make a change to the formatting of the group variable:
- Recode
groupto be an integer (as.integer(factor(group))) - Confirm that group is an integer with values of 1 and 2
# FIXME
HL <- HL |>
mutate(group = as.integer(factor(group)))
unique(HL$group)[1] 1 2
Model specification
We will fit a linear model to these data. The basic model statements are:
\[\begin{align} \mathrm{horn\_length} & \sim Normal(\mu, \sigma) \\ \mu & = b[\mathrm{group}] \\ \end{align}\]
Translated, these lines say that horn_length is modeled by a normal distribution with a mean (\(\mu\)) and standard deviation (\(\sigma\)). The mean is estimated separately for each level of group by a predictor called \(b\).
We have written the model in LaTeX code. LaTeX (written as \(\LaTeX\) by the purists) is a markup language (like html, rmd, qmd, etc.) for typesetting math equations (and lots of other things, like dissertations). You have seen it periodically throughout these problems sets and in the source code for the slides, most often inline using $, as for the mean and standard deviation in the previous paragraph. The align environment allows formatting the equations with the operators lining up (where the & character is located). The two backslashes \\ mark new lines. \mathrm{} makes the text upright (“roman”) although we are in math mode, which is useful to distinguish estimated parameters in italics from variable names.
If you find yourself doing this kind of quantitative analysis often, you will also find yourself writing about it often (maybe even writing manuscripts in quarto), and it’s worth spending the time to pick up the relevant LaTeX coding. The basics are pretty straightforward, and you can get very far with just a little bit of knowledge. There are lots of online tutorials and translators.
Prior specification and prior predictive check
We have not added the lines to the model for the priors yet, because we need to carry out the prior predictive check. In the chunk below, we have given you the basic model as described above.
- Set
evaltotrue - Add lines for the priors. Choose some reasonable values (mean and standard deviation) for a normal distribution for
a_groupand a half-normal forsigma(mean and standard deviation).
We are set to only sample the prior (sample_prior = TRUE), so all the samples will come only from the prior. We still have to pass the data, but it is ignored during sampling.
# FIXME
PP <- ulam(
alist(
horn_length ~ dnorm(mu, sigma),
mu <- b[group],
b[group] ~ dnorm(20, 7.5),
sigma ~ dhalfnorm(0, 5)
),
data = HL,
sample_prior = TRUE
)Running MCMC with 1 chain, with 1 thread(s) per chain...
Chain 1 Iteration: 1 / 1000 [ 0%] (Warmup)
Chain 1 Iteration: 100 / 1000 [ 10%] (Warmup)
Chain 1 Iteration: 200 / 1000 [ 20%] (Warmup)
Chain 1 Iteration: 300 / 1000 [ 30%] (Warmup)
Chain 1 Iteration: 400 / 1000 [ 40%] (Warmup)
Chain 1 Iteration: 500 / 1000 [ 50%] (Warmup)
Chain 1 Iteration: 501 / 1000 [ 50%] (Sampling)
Chain 1 Iteration: 600 / 1000 [ 60%] (Sampling)
Chain 1 Iteration: 700 / 1000 [ 70%] (Sampling)
Chain 1 Iteration: 800 / 1000 [ 80%] (Sampling)
Chain 1 Iteration: 900 / 1000 [ 90%] (Sampling)
Chain 1 Iteration: 1000 / 1000 [100%] (Sampling)
Chain 1 finished in 0.0 seconds.
Because we are only sampling the prior, we can use the default of 500 samples and 1 chain. This is enough samples to see if the values are reasonable.
Below we have provided you the code to extract the b parameter matrix from the samples. extract.samples() returns a list, and when you use [] notation, those parameters are returned as a matrix. We also convert it to a data.frame and rename the columns to match the original column names for the observed data.
- Set
evaltotrue - Plot the prior predictive distributions for both groups as histograms. You can use the same code that you wrote above to compare distributions of the observed data, just changing the data.
# FIXME
pp_dist <- extract.samples(PP)$b |>
as.data.frame() |>
rename(alive = V1,
dead = V2)
pp_dist |>
pivot_longer(cols = everything(),
names_to = "group", values_to = "horn_length") |>
ggplot(aes(horn_length, fill = group)) +
geom_histogram(bins = 30) +
scale_fill_manual(values = c("tomato", "steelblue")) +
facet_grid(group ~ .) +
cowplot::theme_cowplot() +
labs(x = "Horn Length (mm)", y = "Count") +
theme(legend.position = "none")Try out different priors for a_group until you find a prior that seems good to you. Because we are not looking at the values for the sigma standard deviation (we just asked you to extract the b parameter matrix from the samples, ignoring sigma for now), changing its value won’t alter the pattern. If we wanted to, we could choose rows from both priors and simulate data using the full set of parameters, which would be the most correct way to do this simulation. But we should be fine just using values for horn_length.
Final model specification
Add the values that you chose for the priors to the model statement below:
FIXME
\[\begin{align} \mathrm{horn\_length} & \sim Normal(\mu, \sigma) \\ \mu & = b[\mathrm{group}] \\ b[\mathrm{group}] & \sim Normal(20, 7.5) \\ \sigma & \sim HalfNormal(0, 5) \end{align}\]
Sampling
Finally, we can sample the model. Copy the ulam() code from above and make a few modifications:
- Rename the model to something besides
PP(we likefmfor “fitted model”) - Either set
sample_priortoFALSEor delete the line completely. - Set
chains = 4to use 4 replicate chains - Set
iter = 5e3to sample each chain for 5,000 iterations (2,500 warmup and 2,500 post-warmup) - Set
refresh = 1e3to give fewer output lines
This sampling regime will give us 10,000 draws after warmup.
# FIXME
fm <- ulam(
alist(
horn_length ~ dnorm(mu, sigma),
mu <- b[group],
b[group] ~ dnorm(20, 7.5),
sigma ~ dhalfnorm(0, 5)
),
data = HL,
chains = 4,
iter = 5e3,
refresh = 1e3
)Running MCMC with 4 sequential chains, with 1 thread(s) per chain...
Chain 1 Iteration: 1 / 5000 [ 0%] (Warmup)
Chain 1 Iteration: 1000 / 5000 [ 20%] (Warmup)
Chain 1 Iteration: 2000 / 5000 [ 40%] (Warmup)
Chain 1 Iteration: 2501 / 5000 [ 50%] (Sampling)
Chain 1 Iteration: 3500 / 5000 [ 70%] (Sampling)
Chain 1 Iteration: 4500 / 5000 [ 90%] (Sampling)
Chain 1 Iteration: 5000 / 5000 [100%] (Sampling)
Chain 1 finished in 0.1 seconds.
Chain 2 Iteration: 1 / 5000 [ 0%] (Warmup)
Chain 2 Iteration: 1000 / 5000 [ 20%] (Warmup)
Chain 2 Iteration: 2000 / 5000 [ 40%] (Warmup)
Chain 2 Iteration: 2501 / 5000 [ 50%] (Sampling)
Chain 2 Iteration: 3500 / 5000 [ 70%] (Sampling)
Chain 2 Iteration: 4500 / 5000 [ 90%] (Sampling)
Chain 2 Iteration: 5000 / 5000 [100%] (Sampling)
Chain 2 finished in 0.1 seconds.
Chain 3 Iteration: 1 / 5000 [ 0%] (Warmup)
Chain 3 Iteration: 1000 / 5000 [ 20%] (Warmup)
Chain 3 Iteration: 2000 / 5000 [ 40%] (Warmup)
Chain 3 Iteration: 2501 / 5000 [ 50%] (Sampling)
Chain 3 Iteration: 3500 / 5000 [ 70%] (Sampling)
Chain 3 Iteration: 4500 / 5000 [ 90%] (Sampling)
Chain 3 Iteration: 5000 / 5000 [100%] (Sampling)
Chain 3 finished in 0.1 seconds.
Chain 4 Iteration: 1 / 5000 [ 0%] (Warmup)
Chain 4 Iteration: 1000 / 5000 [ 20%] (Warmup)
Chain 4 Iteration: 2000 / 5000 [ 40%] (Warmup)
Chain 4 Iteration: 2501 / 5000 [ 50%] (Sampling)
Chain 4 Iteration: 3500 / 5000 [ 70%] (Sampling)
Chain 4 Iteration: 4500 / 5000 [ 90%] (Sampling)
Chain 4 Iteration: 5000 / 5000 [100%] (Sampling)
Chain 4 finished in 0.1 seconds.
All 4 chains finished successfully.
Mean chain execution time: 0.1 seconds.
Total execution time: 0.9 seconds.
Make note of any warnings or errors from ulam() or stan.
Diagnostics
Pass the model object to either summary() or precis(). Check for adequate sampling as evidenced by sufficiently large n_eff and Rhat4 of ~1.
# FIXME
summary(fm)Inference for Stan model: ulam_cmdstanr_136b8d1952830251a1eaf6658ba4ae71-202303021930-1-212050.
4 chains, each with iter=5000; warmup=2500; thin=1;
post-warmup draws per chain=2500, total post-warmup draws=10000.
mean se_mean sd 2.5% 25% 50% 75% 97.5% n_eff Rhat
b[1] 24.28 0.00 0.22 23.85 24.13 24.28 24.43 24.70 9262 1
b[2] 21.98 0.01 0.49 21.00 21.65 21.98 22.31 22.93 9477 1
sigma 2.66 0.00 0.14 2.40 2.56 2.65 2.75 2.96 9237 1
lp__ -270.78 0.02 1.26 -274.07 -271.33 -270.46 -269.87 -269.33 4480 1
Samples were drawn using NUTS(diag_e) at Thu Mar 02 19:30:06 2023.
For each parameter, n_eff is a crude measure of effective sample size,
and Rhat is the potential scale reduction factor on split chains (at
convergence, Rhat=1).
precis(fm, depth = 2) mean sd 5.5% 94.5% n_eff Rhat4
b[1] 24.279141 0.2181064 23.923700 24.626911 9262.357 0.9998247
b[2] 21.978164 0.4898636 21.187656 22.744105 9477.263 1.0003767
sigma 2.657557 0.1416490 2.441096 2.891888 9236.915 0.9999238
What do you learn from the summary?
Everything looks good. We have almost 10,000 effective samples for each parameter. Rhat is 1.
Now check the sampling visually by making a trace plot (traceplot()) and rank histogram plot (trankplot()).
# FIXME
traceplot(fm)trankplot(fm)What do you learn from the visual checks?
Everything looks good. The traceplot shows that values move quite a lot very early in the sampling, but quickly stabilize to their posterior distributions. The rank histogram plot also looks pretty flat. It’s scaled to take up most of the plot window, but there are no obvious areas of concern.
Taken together, do you think that the sampling was successful?
From the summaries to the plots, everything seems to agree that sampling went well.
Posterior predictive check
First we would like to compare the prior to the posterior. You wouldn’t necessarily always do this, but it is instructive to see how they compare – essentially how much information the data contains.
We would like to get the two sets of samples into the same R object with a new column (PP for prior/posterios) that specifies if the samples came from the prior or the posterior.
- Add a column to
pp_distfrom above namedPPwith the value “Prior”. ThePPcolumn will denote which set of samples we are using. - Extract the posterior by adapting the code we used above for the prior. Then add the same
PPcolumn, but set to “Posterior”. - To get the two sets of samples into the same object, pass both to
bind_rows(). - Now pivot the full set longer using all variables except
PP. Set the names togroupand the values tohorn_length.
# FIXME
pp_dist <- pp_dist |>
mutate(PP = "Prior")
post <- extract.samples(fm)$b |>
as.data.frame() |>
rename(alive = V1,
dead = V2) |>
mutate(PP = "Posterior")
samples <- bind_rows(pp_dist, post) |>
pivot_longer(cols = -PP, names_to = "group", values_to = "horn_length")Now make a density plot of all the samples (adapting the code from above):
- Encode color with
group - Encode linetype with
PP
Your plot should have four lines, two solid lines from the posterior and two dashed lines from the prior.
# FIXME
ggplot() +
geom_density(data = samples, aes(horn_length, color = group,
linetype = PP), linewidth = 1) +
scale_color_manual(values = c("tomato", "steelblue")) +
cowplot::theme_cowplot() +
labs(x = "Horn Length (mm)", y = "Density")How does the posterior compare to the prior?
The priors are broad and relatively flat in comparison to the posteriors. The posteriors are very compressed and do not overlap much. The priors overlap perfectly (they should because we used the same prior for both).
Next we would like to compare the posterior to the data. We will mostly let you try to figure this out. A few suggestions:
- Reload the
HornedLizards.csvfile (copy your code from above). This has the original codings for “alive” and “dead” rather than those recoded as 1 and 2. - The observed data is already in long format. So just add the
PPcolumn as “Observed”. - Use your posterior object from above and pivot that longer as above.
- Add the data separately by using
geom_density()separately for each set of data. Set the linetype manually (i.e., outside ofaes()).
# FIXME
HL <- read_csv("../Data/HornedLizards.csv", show_col_types = FALSE) |>
drop_na() |>
mutate(PP = "Observed")
post_long <- post |>
pivot_longer(cols = -PP, names_to = "group", values_to = "horn_length")
ggplot() +
geom_density(data = HL, aes(horn_length, color = group),
linetype = "solid", linewidth = 1) +
geom_density(data = post_long, aes(horn_length, color = group),
linetype = "dotted", linewidth = 1) +
scale_color_manual(values = c("tomato", "steelblue")) +
cowplot::theme_cowplot() +
labs(x = "Horn Length (mm)", y = "Density")How does the posterior for each group compare to the observed data?
Their central tendencies are very similar. But the spread of the observed data is much larger. This is to be expected. For the posterior, we are plotting the distributions of the expected values for the twogroup means. These values do not include the standard deviation (
sigma). We would have to do things slightly differently to include the variation encoded bysigma. We will do this step later in the course (using the standard deviation generates a “posterior predictive distribution”).
Summarizing the posterior
Compare the means of horn length in the observed data to the means (or medians) of the posterior estimates.
# FIXME
HL |>
group_by(group) |>
summarize(Mean = mean(horn_length))# A tibble: 2 × 2
group Mean
<chr> <dbl>
1 alive 24.3
2 dead 22.0
precis(fm, depth = 2) mean sd 5.5% 94.5% n_eff Rhat4
b[1] 24.279141 0.2181064 23.923700 24.626911 9262.357 0.9998247
b[2] 21.978164 0.4898636 21.187656 22.744105 9477.263 1.0003767
sigma 2.657557 0.1416490 2.441096 2.891888 9236.915 0.9999238
How do they compare?
They are very close.
It is important to remember the conceptual differences between frequentist analysis (like a t-test) and Bayesian inference. The underlying frequentist assumption is that these samples of lizards are drawn from populations with fixed means (i.e., a fixed true mean value for horn length in each group). The Bayesian approach views the data as fixed (i.e., this is what we observe) and seeks to find the parameter estimates that are most compatible. The numerical answers are very very close (at least for this example), but the assumptions / framework are quite different. If you run t.test() on these data, the means that are compared are just the group means. Our Bayesian analysis gives similar values but these are posterior estimates (not just the directly calculated group means for the groups).
We aren’t really interested in the posterior means themselves. We are interested in the difference between them (called a “contrast”). This difference tests the hypothesis that the means of the two groups are different (or not) from one another.
Calculate the difference between dead and alive using the posterior. You can just add a column to the tibble with the posterior.
# FIXME
post <- post |>
mutate(dead_minus_alive = dead - alive)Make a density plot of the difference you calculated.
# FIXME
ggplot(post, aes(dead_minus_alive)) +
geom_density() +
cowplot::theme_cowplot() +
labs(x = "Dead - Alive (mm)", y = "Density")What is your interpretation of the plot?
There appears to be robust and credible difference between the means for the dead vs. alive groups. The peak of the density is about -2 mm, with most of the distribution between -1 and -3.5 mm.
Finally, calculate the 89% highest density interval for the difference in any way you want (HPDI() from rethinking is fine for example).
# FIXME
HPDI(post$dead_minus_alive) |0.89 0.89|
-3.1287 -1.4156
Does the interval agree with your assessment of the plot?
Yes, the interval is from -1.45 to -3.13 mm. So all of the density is far from 0. We don’t need to worry about a ROPE or anything here.
Bivariate regression
Our second model is a bivariate regression, a linear model predicting one continuous outcome variable by another continuous variable. This is an ordinary least-squares-like model. The data come from Tomkins and Brown (2004)2, who studied the proportion of European earwigs (Forficula auricularia) with forceps as a function of population density.
The hypothesis (among others that these authors tested) is that at higher population densities, earwigs are more likely to have pincers, which are used for fighting and courtship.
The data are in Earwigs.csv. There are 22 observations of 2 variables:
Density: The mean number of earwigs caught in traps (a proxy for population density) across different islands in Scotland.Proportion_forceps: The mean proportion of earwigs in that area with forceps
Load the data and plot the relationship between the proportion of earwigs with forceps and population density.
# FIXME
EW <- read_csv("../Data/Earwigs.csv", show_col_types = FALSE)
ggplot() +
geom_point(data = EW, aes(Density, Proportion_forceps),
color = "steelblue", size = 3)Model specification
The basic model we want to fit models Proportion_forceps as a normal distribution. The mean \(\mu\) is a linear function of an intercept b0 plus a slope b1 that is a function of Density:
\[\begin{align} \mathrm{Proportion\_forceps} & \sim Normal(\mu, \sigma) \\ \mu & = b0 + b1 \cdot \mathrm{Density} \\ \end{align}\]
From here on, follow the same steps that you went through for the first set of data. We have given you an outline of the steps with the headings. You can (should) adapt the code you wrote above, changing/adding/removing parts so that you set up the linear model described above. Copy the code chunks and change. Test the code as you go to make sure that everything makes sense.
We advise you to restart your R session before you start this section. Load the packages at the top and then begin with loading the earwigs data. That way you will not have any leftover objects in your session that might get called by the code (i.e., the old variables won’t exist and so you will be sure to get errors if you forget to change something).
For each section we have provided a few guiding questions (to think about but not necessarily answer), but we will leave everything from here on up to you.
Prior specification and prior predictive check
- Should the intercept be large or small? Does the model need an intercept at all, considering it from a biological standpoint?
- Think about the range of the data in x and y. Do you expect the
b1slope parameter to be large or small? What aboutsigma? - What does a prior predictive check look like when the priors define lines, rather than means like above? You might have to expand the y-axis to see the prior predictions better when you plot them.
# FIXME
PP <- ulam(
alist(
Proportion_forceps ~ dnorm(mu, sigma),
mu <- b0 + b1 * Density,
b0 ~ dnorm(0, 0.1 ),
b1 ~ dnorm(0, 0.01),
sigma ~ dhalfnorm(0, 0.1)
),
data = EW,
sample_prior = TRUE
)Running MCMC with 1 chain, with 1 thread(s) per chain...
Chain 1 Iteration: 1 / 1000 [ 0%] (Warmup)
Chain 1 Iteration: 100 / 1000 [ 10%] (Warmup)
Chain 1 Iteration: 200 / 1000 [ 20%] (Warmup)
Chain 1 Iteration: 300 / 1000 [ 30%] (Warmup)
Chain 1 Iteration: 400 / 1000 [ 40%] (Warmup)
Chain 1 Iteration: 500 / 1000 [ 50%] (Warmup)
Chain 1 Iteration: 501 / 1000 [ 50%] (Sampling)
Chain 1 Iteration: 600 / 1000 [ 60%] (Sampling)
Chain 1 Iteration: 700 / 1000 [ 70%] (Sampling)
Chain 1 Iteration: 800 / 1000 [ 80%] (Sampling)
Chain 1 Iteration: 900 / 1000 [ 90%] (Sampling)
Chain 1 Iteration: 1000 / 1000 [100%] (Sampling)
Chain 1 finished in 0.0 seconds.
pp_dist <- extract.samples(PP) |>
as.data.frame() |>
slice_sample(n = 100)
ggplot() +
geom_point(data = EW, aes(Density, Proportion_forceps),
color = "steelblue", size = 3) +
geom_abline(intercept = pp_dist$b0, slope = pp_dist$b1,
color = "tomato",
alpha = 0.5) +
ylim(c(-1, 1))Final model specification
Add the values that you chose for the priors for b0, b1, and sigma to the model statement below:
\[\begin{align} \mathrm{Proportion\_forceps} & \sim Normal(\mu, \sigma) \\ \mu & = b0 + b1 \cdot \mathrm{Density} \\ b0 & \sim Normal(0, 0.1) \\ b1 & \sim Normal(0, 0.01) \\ sigma & \sim HalfNormal(0, 0.1) \\ \end{align}\]
Sampling
- How many draws do you need to make from the model to get a good representation of the posterior?
# FIXME
fm <- ulam(
alist(
Proportion_forceps ~ dnorm(mu, sigma),
mu <- b0 + b1 * Density,
b0 ~ dnorm(0, 0.1),
b1 ~ dnorm(0, 0.01),
sigma ~ dhalfnorm(0, 0.5)
),
data = EW,
chains = 4,
iter = 5e3,
refresh = 1e3
)Running MCMC with 4 sequential chains, with 1 thread(s) per chain...
Chain 1 Iteration: 1 / 5000 [ 0%] (Warmup)
Chain 1 Iteration: 1000 / 5000 [ 20%] (Warmup)
Chain 1 Iteration: 2000 / 5000 [ 40%] (Warmup)
Chain 1 Iteration: 2501 / 5000 [ 50%] (Sampling)
Chain 1 Iteration: 3500 / 5000 [ 70%] (Sampling)
Chain 1 Iteration: 4500 / 5000 [ 90%] (Sampling)
Chain 1 Iteration: 5000 / 5000 [100%] (Sampling)
Chain 1 finished in 0.1 seconds.
Chain 2 Iteration: 1 / 5000 [ 0%] (Warmup)
Chain 2 Iteration: 1000 / 5000 [ 20%] (Warmup)
Chain 2 Iteration: 2000 / 5000 [ 40%] (Warmup)
Chain 2 Iteration: 2501 / 5000 [ 50%] (Sampling)
Chain 2 Iteration: 3500 / 5000 [ 70%] (Sampling)
Chain 2 Iteration: 4500 / 5000 [ 90%] (Sampling)
Chain 2 Iteration: 5000 / 5000 [100%] (Sampling)
Chain 2 finished in 0.1 seconds.
Chain 3 Iteration: 1 / 5000 [ 0%] (Warmup)
Chain 3 Iteration: 1000 / 5000 [ 20%] (Warmup)
Chain 3 Iteration: 2000 / 5000 [ 40%] (Warmup)
Chain 3 Iteration: 2501 / 5000 [ 50%] (Sampling)
Chain 3 Iteration: 3500 / 5000 [ 70%] (Sampling)
Chain 3 Iteration: 4500 / 5000 [ 90%] (Sampling)
Chain 3 Iteration: 5000 / 5000 [100%] (Sampling)
Chain 3 finished in 0.1 seconds.
Chain 4 Iteration: 1 / 5000 [ 0%] (Warmup)
Chain 4 Iteration: 1000 / 5000 [ 20%] (Warmup)
Chain 4 Iteration: 2000 / 5000 [ 40%] (Warmup)
Chain 4 Iteration: 2501 / 5000 [ 50%] (Sampling)
Chain 4 Iteration: 3500 / 5000 [ 70%] (Sampling)
Chain 4 Iteration: 4500 / 5000 [ 90%] (Sampling)
Chain 4 Iteration: 5000 / 5000 [100%] (Sampling)
Chain 4 finished in 0.1 seconds.
All 4 chains finished successfully.
Mean chain execution time: 0.1 seconds.
Total execution time: 0.9 seconds.
Diagnostics
- What visual and summary checks will you use to determine if the model fit adequately and without obvious errors?
# FIXME
summary(fm)Inference for Stan model: ulam_cmdstanr_741b16a5963d6ecea96eeda1bb6d4065-202303021930-1-3c5789.
4 chains, each with iter=5000; warmup=2500; thin=1;
post-warmup draws per chain=2500, total post-warmup draws=10000.
mean se_mean sd 2.5% 25% 50% 75% 97.5% n_eff Rhat
b0 0.13 0.00 0.05 0.03 0.10 0.14 0.17 0.23 4230 1
b1 0.01 0.00 0.00 0.00 0.00 0.01 0.01 0.01 4994 1
sigma 0.17 0.00 0.03 0.13 0.15 0.17 0.19 0.25 3950 1
lp__ 25.20 0.02 1.29 21.88 24.61 25.54 26.15 26.68 3562 1
Samples were drawn using NUTS(diag_e) at Thu Mar 02 19:30:15 2023.
For each parameter, n_eff is a crude measure of effective sample size,
and Rhat is the potential scale reduction factor on split chains (at
convergence, Rhat=1).
precis(fm) mean sd 5.5% 94.5% n_eff Rhat4
b0 0.134556438 0.050774708 0.053611102 0.214171035 4230.169 1.000534
b1 0.005884063 0.001619751 0.003366997 0.008525237 4993.522 1.000636
sigma 0.174061089 0.030253262 0.134014790 0.226734045 3950.106 1.000492
# FIXME
traceplot(fm)trankplot(fm)Posterior predictive simulation
- Similar to the prior predictive check, what does the posterior predictive check look like when the posterior samples define lines?
# FIXME
post <- extract.samples(fm) |>
as.data.frame()
post_sub <- post |>
slice_sample(n = 100)
ggplot() +
geom_point(data = EW, aes(Density, Proportion_forceps),
color = "steelblue", size = 3) +
geom_abline(intercept = post_sub$b0, slope = post_sub$b1,
color = "tomato",
alpha = 0.5)Summarizing the posterior
- What hypothesis are you testing with these data?
- How will you summarize the posterior in a way that will test that hypothesis?
HPDI(post$b1) |0.89 0.89|
0.00323666 0.00836196
# FIXME
# With a more constrained intercept and less constrained slope
fm <- ulam(
alist(
Proportion_forceps ~ dnorm(mu, sigma),
mu <- b0 + b1 * Density,
b0 ~ dnorm(0, 0.01),
b1 ~ dnorm(0, 0.05),
sigma ~ dhalfnorm(0, 0.5)
),
data = EW,
chains = 4,
iter = 5e3,
refresh = 0
)Running MCMC with 4 sequential chains, with 1 thread(s) per chain...
Chain 1 finished in 0.1 seconds.
Chain 2 finished in 0.1 seconds.
Chain 3 finished in 0.1 seconds.
Chain 4 finished in 0.1 seconds.
All 4 chains finished successfully.
Mean chain execution time: 0.1 seconds.
Total execution time: 0.4 seconds.
precis(fm) mean sd 5.5% 94.5% n_eff Rhat4
b0 0.003902373 0.009985312 -0.011797374 0.02009075 5868.349 1.000227
b1 0.009027690 0.001368893 0.006872659 0.01122410 9495.022 1.000237
sigma 0.205031477 0.034222435 0.158261505 0.26388665 5537.174 1.000318
post <- extract.samples(fm) |>
as.data.frame()
post_sub <- post |>
slice_sample(n = 100)
ggplot() +
geom_point(data = EW, aes(Density, Proportion_forceps),
color = "steelblue", size = 3) +
geom_abline(intercept = post_sub$b0, slope = post_sub$b1,
color = "tomato",
alpha = 0.5)HPDI(post$b1) |0.89 0.89|
0.00686508 0.01120960
# FIXME
# With no intercept, leave out b0
fm <- ulam(
alist(
Proportion_forceps ~ dnorm(mu, sigma),
mu <- b1 * Density,
b1 ~ dnorm(0, 0.05),
sigma ~ dhalfnorm(0, 0.5)
),
data = EW,
chains = 4,
iter = 5e3,
refresh = 0
)Running MCMC with 4 sequential chains, with 1 thread(s) per chain...
Chain 1 finished in 0.1 seconds.
Chain 2 finished in 0.1 seconds.
Chain 3 finished in 0.1 seconds.
Chain 4 finished in 0.1 seconds.
All 4 chains finished successfully.
Mean chain execution time: 0.1 seconds.
Total execution time: 0.5 seconds.
precis(fm) mean sd 5.5% 94.5% n_eff Rhat4
b1 0.009135142 0.001342284 0.006984585 0.01122803 6042.096 1.000227
sigma 0.206980881 0.035164451 0.159487395 0.26903137 3479.769 1.000594
post <- extract.samples(fm) |>
as.data.frame()
post_sub <- post |>
slice_sample(n = 100)
ggplot() +
geom_point(data = EW, aes(Density, Proportion_forceps),
color = "steelblue", size = 3) +
geom_abline(intercept = 0, slope = post_sub$b1,
color = "tomato",
alpha = 0.5)HPDI(post$b1) |0.89 0.89|
0.00695914 0.01119150
ANOVA-like linear model
The final model is an ANOVA-like model, with one categorical predictor with three levels. The data for this problem come from an experiment designed to test whether application of light to the eyes would result in a shorter time shift for the test subject.
The data are in JetLag.csv. There are two columns:
Treatment: the categorical predictor with three levels. The “control” group had no light application. The “eyes” group had light in their eyes (the test condition). The “knee” group had light allied to the back of the knee as a negative control (there are no photoreceptors on the knee).Shift: the continuous outcome variable measured in hours
The hypothesis that you want to test is whether light application to the eyes resulted in a different mean shift than did light application to the knee. Thus we are only interested in two contrasts:
- knee vs. control
- eyes vs. control
The knee vs. eyes contrast is not interesting for this experiment.
This model is very similar to the first one in this problem set. Below we have given you empty headings. Adapt the code from above to test this hypothesis.
# FIXME
JL <- read_csv("../Data/JetLag.csv", show_col_types = FALSE) |>
mutate(Treatment = factor(Treatment,
levels = c("control", "knee", "eyes")))
ggplot() +
geom_point(data = JL, aes(x = Treatment, y = Shift, color = Treatment),
position = position_jitter(width = 0.1, seed = 34879),
size = 3) +
scale_color_manual(values = c("tomato", "steelblue", "darkorchid")) +
labs(y = "Shift (h)") +
theme(legend.position = "none")Model specification
\[\begin{align} \mathrm{Shift} & \sim Normal(\mu, \sigma) \\ \mu & = b[\mathrm{Treatment}] \\ \end{align}\]
Prior specification and prior predictive check
# FIXME
PP <- ulam(
alist(
Shift ~ dnorm(mu, sigma),
mu <- b[Treatment],
b[Treatment] ~ dnorm(0, 3),
sigma ~ dhalfnorm(0, 2)
),
data = JL,
sample_prior = TRUE
)Running MCMC with 1 chain, with 1 thread(s) per chain...
Chain 1 Iteration: 1 / 1000 [ 0%] (Warmup)
Chain 1 Iteration: 100 / 1000 [ 10%] (Warmup)
Chain 1 Iteration: 200 / 1000 [ 20%] (Warmup)
Chain 1 Iteration: 300 / 1000 [ 30%] (Warmup)
Chain 1 Iteration: 400 / 1000 [ 40%] (Warmup)
Chain 1 Iteration: 500 / 1000 [ 50%] (Warmup)
Chain 1 Iteration: 501 / 1000 [ 50%] (Sampling)
Chain 1 Iteration: 600 / 1000 [ 60%] (Sampling)
Chain 1 Iteration: 700 / 1000 [ 70%] (Sampling)
Chain 1 Iteration: 800 / 1000 [ 80%] (Sampling)
Chain 1 Iteration: 900 / 1000 [ 90%] (Sampling)
Chain 1 Iteration: 1000 / 1000 [100%] (Sampling)
Chain 1 finished in 0.0 seconds.
pp_dist <- extract.samples(PP)$b |>
as.data.frame() |>
rename(control = V1,
knee = V2,
eyes = V3)
pp_dist |>
pivot_longer(cols = everything(),
names_to = "Treatment",
values_to = "Shift") |>
ggplot(aes(Shift, fill = Treatment)) +
geom_histogram(bins = 30) +
scale_fill_manual(values = c("tomato", "steelblue", "darkorchid")) +
facet_grid(Treatment ~ .) +
cowplot::theme_cowplot() +
labs(x = "Shift (h)", y = "Count") +
theme(legend.position = "none")Final model specification
\[\begin{align} \mathrm{Shift} & \sim Normal(\mu, \sigma) \\ \mu & = b[\mathrm{Treatment}] \\ b[\mathrm{Treatment}] & \sim Normal(0, 3) \\ \sigma & \sim HalfNormal(0, 2) \end{align}\]
Sampling
# FIXME
fm <- ulam(
alist(
Shift ~ dnorm(mu, sigma),
mu <- b[Treatment],
b[Treatment] ~ dnorm(0, 3),
sigma ~ dhalfnorm(0, 2)
),
data = JL,
chains = 4,
iter = 5e3,
refresh = 0
)Running MCMC with 4 sequential chains, with 1 thread(s) per chain...
Chain 1 finished in 0.1 seconds.
Chain 2 finished in 0.1 seconds.
Chain 3 finished in 0.1 seconds.
Chain 4 finished in 0.1 seconds.
All 4 chains finished successfully.
Mean chain execution time: 0.1 seconds.
Total execution time: 0.5 seconds.
Diagnostics
# FIXME
summary(fm)Inference for Stan model: ulam_cmdstanr_2e73e5719d31c59140137392db0cf28a-202303021930-1-714c82.
4 chains, each with iter=5000; warmup=2500; thin=1;
post-warmup draws per chain=2500, total post-warmup draws=10000.
mean se_mean sd 2.5% 25% 50% 75% 97.5% n_eff Rhat
b[1] -0.30 0.00 0.27 -0.83 -0.48 -0.30 -0.13 0.23 9066 1
b[2] -0.33 0.00 0.29 -0.91 -0.52 -0.33 -0.15 0.24 9668 1
b[3] -1.53 0.00 0.29 -2.10 -1.73 -1.53 -1.34 -0.96 9686 1
sigma 0.75 0.00 0.13 0.55 0.66 0.73 0.83 1.07 6970 1
lp__ -4.48 0.03 1.58 -8.46 -5.21 -4.11 -3.33 -2.54 3791 1
Samples were drawn using NUTS(diag_e) at Thu Mar 02 19:30:33 2023.
For each parameter, n_eff is a crude measure of effective sample size,
and Rhat is the potential scale reduction factor on split chains (at
convergence, Rhat=1).
precis(fm, depth = 2) mean sd 5.5% 94.5% n_eff Rhat4
b[1] -0.3027944 0.2660556 -0.7253620 0.1105382 9065.841 0.9998279
b[2] -0.3306119 0.2883560 -0.7884509 0.1274212 9668.304 1.0005716
b[3] -1.5327436 0.2917025 -1.9901000 -1.0779945 9685.551 1.0000301
sigma 0.7533065 0.1325935 0.5777423 0.9909862 6970.403 1.0000077
# FIXME
traceplot(fm)
trankplot(fm)Posterior predictive simulation
# FIXME
post <- extract.samples(fm)$b |>
as.data.frame() |>
rename(control = V1,
knee = V2,
eyes = V3)
post |>
pivot_longer(cols = everything(),
names_to = "Treatment",
values_to = "Shift") |>
ggplot(aes(Shift, color = Treatment)) +
geom_density(linewidth = 1.5) +
geom_point(data = JL, aes(x = Shift, y = 0, color = Treatment),
size = 3) +
scale_color_manual(values = c("tomato", "steelblue", "darkorchid")) +
facet_grid(Treatment ~ .) +
cowplot::theme_cowplot() +
labs(x = "Shift (h)", y = "Count") +
theme(legend.position = "none")Summarizing the posterior
# FIXME
post_contrast <- post |>
mutate(`Knee vs. Control` = knee - control,
`Eyes vs. Control` = eyes - control,
.keep = "none")
post_contrast |>
pivot_longer(cols = everything(),
names_to = "Contrast", values_to = "Difference") |>
ggplot(aes(Difference, color = Contrast)) +
scale_color_manual(values = c("steelblue", "darkorchid")) +
geom_density(linewidth = 1.5)apply(post_contrast, MARGIN = 2, FUN = HPDI) Knee vs. Control Eyes vs. Control
|0.89 -0.6605996 -1.867224
0.89| 0.5862680 -0.626854